Week 6: Analysis of longitudinal data

After working hard with multivariate, mostly exploratory, even heuristic techniques that are common in data science, the last topic of IODS course will take us back in the task of building statistical models.

The new challenge is that the data will include two types of dependencies simultaneously: In addition to the correlated variables that we have faced with all models and methods so far, the observations of the data will also be correlated with each other.

Lets start this weeks session

Usually, we can assume that the observations are not correlated - instead, they are assumed to be independent of each other. However, in longitudinal settings this assumption seldom holds, because we have multiple observations or measurements of the same individuals. The concept of repeated measures highlights this phenomenon that is actually quite typical in many applications. Both types of dependencies (variables and observations) must be taken into account; otherwise the models will be biased.

To analyze this kind of data sets, we will focus on a single class of methods, called linear mixed effects models that can cope quite nicely with the setting described above.

Before we consider two examples of mixed models, namely the random intercept model and the random intercept and slope model, we will learn how to wrangle longitudinal data in wide form and long form, take a look at some graphical displays of longitudinal data, and try a simple summary measure approach that may sometimes provide a useful first step in these challenges. In passing, we “violently” apply the usual “fixed” models (although we know that they are not the right choice here) in order to compare the results and see the consequences of making invalid assumptions.

Load the packages first !!

6.1 Packages for Week 6 !!

#load required packages
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ✔ purrr   0.3.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(dplyr)
library(stringr)
library(psych) 
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(FactoMineR)
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

That’s a lot of packages !!!

Analysis of RATS data

Lets implement the analyses of Chapter 8 of MABS, using the R codes of Exercise Set 6: Meet and Repeat: PART I but using the RATS data (from Chapter 9 and Meet and Repeat: PART II) as instructed in the Moodle.

6.2 Loading the data

# read long format data of rats
RATSL <- read.csv('RATSL.csv')

# Lets convert categorical data to factors first 
## WE have ID and Group (Just like wrangling exercise)
RATSL$ID <- factor(RATSL$ID)
RATSL$Group <- factor(RATSL$Group)

# glimpse and dimensions
head(RATSL);dim(RATSL)
##   ID Group  WD Weight Time
## 1  1     1 WD1    240    1
## 2  2     1 WD1    225    1
## 3  3     1 WD1    245    1
## 4  4     1 WD1    260    1
## 5  5     1 WD1    255    1
## 6  6     1 WD1    260    1
## [1] 176   5
str(RATSL)
## 'data.frame':    176 obs. of  5 variables:
##  $ ID    : Factor w/ 16 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Group : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 2 2 ...
##  $ WD    : chr  "WD1" "WD1" "WD1" "WD1" ...
##  $ Weight: int  240 225 245 260 255 260 275 245 410 405 ...
##  $ Time  : int  1 1 1 1 1 1 1 1 1 1 ...
summary(RATSL)
##        ID      Group       WD                Weight           Time      
##  1      : 11   1:88   Length:176         Min.   :225.0   Min.   : 1.00  
##  2      : 11   2:44   Class :character   1st Qu.:267.0   1st Qu.:15.00  
##  3      : 11   3:44   Mode  :character   Median :344.5   Median :36.00  
##  4      : 11                             Mean   :384.5   Mean   :33.55  
##  5      : 11                             3rd Qu.:511.2   3rd Qu.:50.00  
##  6      : 11                             Max.   :628.0   Max.   :64.00  
##  (Other):110

The data set contains observation from 6 rats and 11 observation of change in weight by Time. They are divided into 3 groups based on treatment. Weight in this case is the outcome variable in this longitudinal study. The idea is to analyse the weight difference in three group over time.

6.3 Plotting the data

ggplot(RATSL, aes(x = Time, y = Weight, group = ID)) +
  geom_line(aes(linetype = Group))+
  scale_x_continuous(name = "Time (days)", breaks = seq(0, 60, 10))+
  scale_y_continuous(name = "Weight (grams)")+
  theme(legend.position = "top")

# Draw the plot ##We plot the Weights of each rat by time and groups
# Rats are divided into three groups
ggplot(RATSL, aes(x = Time, y = Weight, linetype = Group, color = ID)) +
  geom_line() +
  scale_linetype_manual(values = rep(1:10, times=4)) +
  facet_grid(. ~ Group, labeller = label_both) +
  theme(legend.position = "none") + 
  scale_y_continuous(limits = c(min(RATSL$Weight), max(RATSL$Weight)))

From the plot we can see that Group 1 has the most rats with lowest weight even at starting point (time of recruitment). Group 2 the most incremental weight outcome compare to baseline but has only 4 rats. Group 3 has also 4 rats with almost same weight range as Group 2, however the weight doesn’t seem to increase significantly as group 2.

6.4 Standardizing for tracking

Higher baseline values means higher values throughout the study.This phenomenon is generally referred to as tracking.

The tracking phenomenon can be seen more clearly in a plot of the standardized values of each observation, i.e.,

\[standardised(x) = \frac{x - mean(x)}{ sd(x)}\]

# Standardize the variable weight by groups
RATSL <- RATSL %>%
  group_by(Time) %>%
  mutate(Weight_std = (scale(Weight))) %>%
  ungroup()

# Glimpse the data
glimpse(RATSL)
## Rows: 176
## Columns: 6
## $ ID         <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2…
## $ Group      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1,…
## $ WD         <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD…
## $ Weight     <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555,…
## $ Time       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8,…
## $ Weight_std <dbl[,1]> <matrix[26 x 1]>
head(RATSL)
## # A tibble: 6 × 6
##   ID    Group WD    Weight  Time Weight_std[,1]
##   <fct> <fct> <chr>  <int> <int>          <dbl>
## 1 1     1     WD1      240     1         -1.00 
## 2 2     1     WD1      225     1         -1.12 
## 3 3     1     WD1      245     1         -0.961
## 4 4     1     WD1      260     1         -0.842
## 5 5     1     WD1      255     1         -0.882
## 6 6     1     WD1      260     1         -0.842
# Plot again with the standardized weight in RATSL 
ggplot(RATSL, aes(x = Time, y = Weight_std, linetype = Group, color =ID)) +
  geom_line() +
  scale_linetype_manual(values = rep(1:10, times=4)) +
  facet_grid(. ~ Group, labeller = label_both) +
  theme(legend.position = "none")

  scale_y_continuous(name = "standardized Weight")
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --    1

The weight difference looks similar now after the standardization,

6.5 Summary graphs

With large numbers of observations, graphical displays of individual response profiles are of little use and investigators then commonly produce graphs showing average (mean) profiles for each treatment group along with some indication of the variation of the observations at each time point, in this case the standard error of mean

\[se = \frac{sd(x)}{\sqrt{n}}\]

# Summary data with mean and standard error of RATSl Weight by group and time
RATSS <- RATSL %>%
  group_by(Group, Time) %>%
  summarise(mean = mean(Weight), se = (sd(Weight)/sqrt(length(Weight))) ) %>% #using formula above;
  ungroup()
## `summarise()` has grouped output by 'Group'. You can override using the
## `.groups` argument.
# Glimpse the data
glimpse(RATSL)
## Rows: 176
## Columns: 6
## $ ID         <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2…
## $ Group      <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1,…
## $ WD         <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD…
## $ Weight     <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555,…
## $ Time       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8,…
## $ Weight_std <dbl[,1]> <matrix[26 x 1]>
# Plot the mean profiles
ggplot(RATSS, aes(x = Time, y = mean, color=Group, linetype = Group, shape = Group)) +
  geom_line() +
  scale_linetype_manual(values = c(1,2,3)) +
  geom_point(size=3) +
  scale_shape_manual(values = c(1,2,3)) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se, linetype="1"), width=0.3) +
  theme(legend.position = "right") +
  scale_y_continuous(name = "mean(Weight) +/- se(Weight)")

From the plot we can see ,All groups are independent and doesn’t seem to overlap with each other. There is a signifiant difference in Group 1 compared to Group 2 and Group 3. It is also clear that the weight of the rat seems to increase over time (observation) with a significant increase in Group 2 and 3.

6.6 Find the outlier using summary measure approach

Using the summary measure approach we will look into the post treatment values of the RATSL data set. Lets look into the mean weight for each rat. The mean of weeks will be our summary measure and we’ll plot boxplots of the mean for each diet group which is our treatment measure.

# Create a summary data by treatment and subject with mean as the summary variable (ignoring baseline week 0)
RATSL8S <- RATSL %>%
  filter(Time > 0) %>%
  group_by(Group, ID) %>%
  summarise(Weight_mean = mean(Weight)) %>%
  ungroup()
## `summarise()` has grouped output by 'Group'. You can override using the
## `.groups` argument.
# Glimpse the data
glimpse(RATSL8S)
## Rows: 16
## Columns: 3
## $ Group       <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
## $ ID          <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ Weight_mean <dbl> 261.0909, 237.6364, 260.1818, 266.5455, 269.4545, 274.7273…
# Draw a boxplot of the mean versus treatment

ggplot(RATSL8S, aes(x = Group, y = Weight_mean, color = Group)) +
  geom_boxplot() +
  stat_summary(fun = "mean", geom = "point", shape=23, size=4, fill = "white") +
  scale_y_continuous(name = "mean(Weight), Days 1-60")

From the box plot, we can see all three groups has outliers. Group 2 has a large one making the uneven distribution. The next step is to find and filter the outliers identified above.

# define outlier from group 3
g3 <- RATSL8S %>% filter(Group==3)
out3 <- min(g3$Weight_mean)

# Create a new data by filtering the outliers
RATSL8S2 <- RATSL8S %>%
  filter(250 < Weight_mean & Weight_mean < 560 & Weight_mean != out3)


# Draw a boxplot of the mean versus diet
ggplot(RATSL8S2, aes(x = Group, y = Weight_mean, col=Group)) +
  geom_boxplot() +
  stat_summary(fun = "mean", geom = "point", shape=23, size=4, fill = "white") +
  scale_y_continuous(name = "mean(Weight) by days")

### 6.7 T-test and ANOVA

Although the informal graphical material presented up to now has all indicated a lack of difference in the two treatment groups, most investigators would still require a formal test for a difference. Consequently we shall now apply a t-test to assess any difference between the treatment groups, and also calculate a confidence interval for this difference. We use the data without the outlier created above. The t-test confirms the lack of any evidence for a group difference. Also the 95% confidence interval is wide and includes the zero, allowing for similar conclusions to be made. However, T-test only tests for a statistical difference between two groups and in the dataset above we have 3 corresponding groups to be compared, we will therefore use a more stringent and diverse test ANOVA which compares differences among multiple groups. ANOVA assumes homogeniety of variance-the variance in the groups 1-3 should be similar

# Load original wide form rats data
RATSL <- as_tibble(read.table("https://raw.githubusercontent.com/KimmoVehkalahti/MABS/master/Examples/data/rats.txt", header = TRUE, sep = '\t'))
RATSL$ID <- factor(RATSL$ID)

# Add the baseline from the original data as a new variable to the summary data
join_vars <- c("ID","WD1")
RATSL8S3 <- RATSL8S2 %>%
  left_join(RATSL[join_vars], by='ID') 
# Rename column
RATSL8S3 <- RATSL8S3 %>%
  rename('Weight_baseline' = 'WD1')

# Fit the linear model with the mean Weight as the response 
fit2 <- lm(Weight_mean ~ Weight_baseline + Group, data = RATSL8S3)

# Compute the analysis of variance table for the fitted model with anova()
anova(fit2)
## Analysis of Variance Table
## 
## Response: Weight_mean
##                 Df Sum Sq Mean Sq  F value    Pr(>F)    
## Weight_baseline  1 174396  174396 7999.137 1.384e-14 ***
## Group            2   1707     853   39.147 3.628e-05 ***
## Residuals        9    196      22                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at the ANOVA table, p-values < 0.05 considering a significance level p = 0.05 at 95% CI. There seem to be significant difference between the groups. The data however doesn’t tell us much about differences between which groups, i.e., multiple comparison. Usually data which follows the normal distribution curve are analysed with ANOVA followed by tukey test for multiple comparison. However in case of data which doesn’t follow the normal distribution curve, Kruskal Wallis followed by Dunn’s test for multiple comparison is conducted. Now assuming our data as normally distributed as we have been doing in this exercise, we can perform a tukey’s test for multiple comparison.

Analysis of BPRS data

Lets implement Implement the analyses of Chapter 9 of MABS, using the R codes of Exercise Set 6: Meet and Repeat: PART II, but using the BPRS data (from Chapter 8 and Meet and Repeat: PART I) as instructed in the Moodle.

BPRS data includes data pertaining to a brief psychiatric rating scale (BPRS) score prior to treatment and BPRS from 8 weeks during treatment. The patients (n=40) have been randomly assigned to treatment arm 1 or 2 and we are interested whether there is a difference in BPRS scores depending on the received treatment. A lower score means less symptoms.

6.8 Loading the data

Lets load and explore the data first

BPRSL <- read.table("BPRSL.csv", header = T, sep=",")

# Factor treatment & subject
BPRSL$treatment <- factor(BPRSL$treatment)
BPRSL$subject <- factor(BPRSL$subject)


# Glimpse the data

glimpse(BPRSL)
## Rows: 360
## Columns: 6
## $ X         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ subject   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ weeks     <chr> "week0", "week0", "week0", "week0", "week0", "week0", "week0…
## $ bprs      <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 66, …
## $ week      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
str(BPRSL)
## 'data.frame':    360 obs. of  6 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ treatment: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ subject  : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ weeks    : chr  "week0" "week0" "week0" "week0" ...
##  $ bprs     : int  42 58 54 55 72 48 71 30 41 57 ...
##  $ week     : int  0 0 0 0 0 0 0 0 0 0 ...
summary(BPRSL)
##        X          treatment    subject       weeks                bprs      
##  Min.   :  1.00   1:180     1      : 18   Length:360         Min.   :18.00  
##  1st Qu.: 90.75   2:180     2      : 18   Class :character   1st Qu.:27.00  
##  Median :180.50             3      : 18   Mode  :character   Median :35.00  
##  Mean   :180.50             4      : 18                      Mean   :37.66  
##  3rd Qu.:270.25             5      : 18                      3rd Qu.:43.00  
##  Max.   :360.00             6      : 18                      Max.   :95.00  
##                             (Other):252                                     
##       week  
##  Min.   :0  
##  1st Qu.:2  
##  Median :4  
##  Mean   :4  
##  3rd Qu.:6  
##  Max.   :8  
## 
glimpse(BPRSL)
## Rows: 360
## Columns: 6
## $ X         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ subject   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ weeks     <chr> "week0", "week0", "week0", "week0", "week0", "week0", "week0…
## $ bprs      <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 66, …
## $ week      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

BPRSL data set has 360 observations and 6 variable. From the glimpse function, we can see the 6 columns, in which two treatment arms are coded 1 and 2 for treatment 1 and treatment 2. Subjects are coded from 1 to 20 however the repetition of same code in subjects suggests that participants were randomized to Treatment 1 or 2.

#Lets plot the data 
ggplot(BPRSL, aes(x = week, y = bprs, linetype = subject)) +
  geom_line() +
  scale_linetype_manual(values = rep(1:10, times=4)) +
  facet_grid(. ~ treatment, labeller = label_both) +
  theme(legend.position = "none") + 
  scale_y_continuous(limits = c(min(BPRSL$bprs), max(BPRSL$bprs)))

From the plot, it appears that BPRS seems to decrease during the treatment period in both treatment arms. A clear diffrerence cannot be validated however between groups from the plots.

6.9 Linear Mixed Effects Models

# lets create a regression model for BPRSL
BPRS_reg <-  lm(bprs ~ week + treatment, data = BPRSL)

# print summary of the model 
summary(BPRS_reg)
## 
## Call:
## lm(formula = bprs ~ week + treatment, data = BPRSL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.454  -8.965  -3.196   7.002  50.244 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  46.4539     1.3670  33.982   <2e-16 ***
## week         -2.2704     0.2524  -8.995   <2e-16 ***
## treatment2    0.5722     1.3034   0.439    0.661    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.37 on 357 degrees of freedom
## Multiple R-squared:  0.1851, Adjusted R-squared:  0.1806 
## F-statistic: 40.55 on 2 and 357 DF,  p-value: < 2.2e-16

We have BPRS score as our target variable and time (weeks) and treatment 1 and 2 as our explanatory variable. From the summary model, week variable seem to be statistically significant with BPRS but treatment variable doesn’t (p = 0.661). No significant difference can be seen in the difference in BPRS based on treatments However this analyses assumes independence of observations, i.e., the observation or outcome is not affected by any other confounder and is completely influenced by the explanatory variable which is not very rational. Therefore we now move on to a more stringent analyses which assumes observations ad dependent variable and can be influence by other effect. We will analyse the data set with both Fixed-effect models and Random-effect models.

#lets load the package 
library(lme4)

# Create a random intercept model
BPRS_ref <- lmer(bprs ~ week + treatment + (1 | subject), data = BPRSL, REML = FALSE)

# Print the summary of the model
summary(BPRS_ref)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bprs ~ week + treatment + (1 | subject)
##    Data: BPRSL
## 
##      AIC      BIC   logLik deviance df.resid 
##   2748.7   2768.1  -1369.4   2738.7      355 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0481 -0.6749 -0.1361  0.4813  3.4855 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept)  47.41    6.885  
##  Residual             104.21   10.208  
## Number of obs: 360, groups:  subject, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  46.4539     1.9090  24.334
## week         -2.2704     0.2084 -10.896
## treatment2    0.5722     1.0761   0.532
## 
## Correlation of Fixed Effects:
##            (Intr) week  
## week       -0.437       
## treatment2 -0.282  0.000
confint(BPRS_ref)
## Computing profile confidence intervals ...
##                 2.5 %    97.5 %
## .sig01       4.951451 10.019188
## .sigma       9.486731 11.026604
## (Intercept) 42.608796 50.298982
## week        -2.679990 -1.860844
## treatment2  -1.542804  2.687248

Lets first reflect on our BPRS data set. Our maxima and minima is 95 and 18 respectively. Given the high variance, the baseline seems to differ from the outcomes.

Let’s fit the random intercept and random slope model to our data:

6.10 Random Intercept and Random Slope Model

Fitting a random intercept and random slope model allows the linear regression fits for each individual to differ in intercept but also in slope. This allows us to account for the individual differences in the individuals symptom (BRPS score) profiles, but also the effect of time.

# create a random intercept and random slope model
BPRS_ref1 <- lmer(bprs ~ week * treatment + (week | subject), data = BPRSL, REML = FALSE)

# print a summary of the model

summary(BPRS_ref1)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bprs ~ week * treatment + (week | subject)
##    Data: BPRSL
## 
##      AIC      BIC   logLik deviance df.resid 
##   2744.3   2775.4  -1364.1   2728.3      352 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0512 -0.6271 -0.0768  0.5288  3.9260 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  subject  (Intercept) 64.9964  8.0620        
##           week         0.9687  0.9842   -0.51
##  Residual             96.4707  9.8220        
## Number of obs: 360, groups:  subject, 20
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)      47.8856     2.2521  21.262
## week             -2.6283     0.3589  -7.323
## treatment2       -2.2911     1.9090  -1.200
## week:treatment2   0.7158     0.4010   1.785
## 
## Correlation of Fixed Effects:
##             (Intr) week   trtmn2
## week        -0.650              
## treatment2  -0.424  0.469       
## wek:trtmnt2  0.356 -0.559 -0.840
confint((BPRS_ref1))
## Computing profile confidence intervals ...
## Warning in FUN(X[[i]], ...): non-monotonic profile for .sig02
## Warning in confint.thpr(pp, level = level, zeta = zeta): bad spline fit
## for .sig02: falling back to linear interpolation
##                       2.5 %     97.5 %
## .sig01           5.39069597 12.1613571
## .sig02          -0.82694500  0.1404228
## .sig03           0.43747319  1.6543265
## .sigma           9.10743668 10.6349246
## (Intercept)     43.31885100 52.4522602
## week            -3.34923718 -1.9074295
## treatment2      -6.04400897  1.4617868
## week:treatment2 -0.07243289  1.5040996
# perform an ANOVA test on the two models to assess formal differences between them
anova(BPRS_ref1, BPRS_ref)
## Data: BPRSL
## Models:
## BPRS_ref: bprs ~ week + treatment + (1 | subject)
## BPRS_ref1: bprs ~ week * treatment + (week | subject)
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## BPRS_ref     5 2748.7 2768.1 -1369.4   2738.7                       
## BPRS_ref1    8 2744.3 2775.4 -1364.1   2728.3 10.443  3    0.01515 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA test seems significant (p < 0.05). Addition of slope definitely increases the model fit. Its clear that addition of a random intercept model increased the inter individual variance. Treatment group seems unaffected over time. Lets see the slope model and see the outcomes.

6.11 Random Intercept and Random Slope Model with interaction

BPRS_ref2 <- lmer(bprs ~ week * treatment + (week | subject), data = BPRSL, REML = FALSE)

# print a summary of the model

summary(BPRS_ref2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: bprs ~ week * treatment + (week | subject)
##    Data: BPRSL
## 
##      AIC      BIC   logLik deviance df.resid 
##   2744.3   2775.4  -1364.1   2728.3      352 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0512 -0.6271 -0.0768  0.5288  3.9260 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  subject  (Intercept) 64.9964  8.0620        
##           week         0.9687  0.9842   -0.51
##  Residual             96.4707  9.8220        
## Number of obs: 360, groups:  subject, 20
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)      47.8856     2.2521  21.262
## week             -2.6283     0.3589  -7.323
## treatment2       -2.2911     1.9090  -1.200
## week:treatment2   0.7158     0.4010   1.785
## 
## Correlation of Fixed Effects:
##             (Intr) week   trtmn2
## week        -0.650              
## treatment2  -0.424  0.469       
## wek:trtmnt2  0.356 -0.559 -0.840
# perform an ANOVA test on the two models
anova(BPRS_ref2, BPRS_ref1)
## Data: BPRSL
## Models:
## BPRS_ref2: bprs ~ week * treatment + (week | subject)
## BPRS_ref1: bprs ~ week * treatment + (week | subject)
##           npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
## BPRS_ref2    8 2744.3 2775.4 -1364.1   2728.3                    
## BPRS_ref1    8 2744.3 2775.4 -1364.1   2728.3     0  0

Finally, looking at the model, the two model and outcomes seems similar. Comparing the ANOVA test conforms there isn’t much significant difference between them. Adding the interaction variable as mentioned above in model 1 doesn’t seem to work out as the model didn’t change and the significance level hasn’t changed either.

6.12 Plotting the observed BPRS values and the fitted BPRS values

# draw the plot of BPRSL with the observed BPRS values
ggplot(BPRSL, aes(x = week, y = bprs, group = subject, col= treatment)) +
  geom_line(aes(linetype = treatment)) +
  scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
  scale_y_continuous(name = "Observed BPRS") +
  theme(legend.position = "right") +
  facet_grid(. ~ treatment, labeller=label_both)

# Create a vector of the fitted values
Fitted <- fitted(BPRS_ref)
Fitted1 <- fitted(BPRS_ref1)
Fitted2 <- fitted(BPRS_ref2)

# Create a new column fitted to BPRSL
BPRSL <- BPRSL %>% mutate(bprs_fitted_values_BPRSL_ref = Fitted, bprs_fitted_values_BPRSL_ref1 = Fitted1, bprs_fitted_values_BPRSL_ref2 = Fitted2)
head(BPRSL)
##   X treatment subject weeks bprs week bprs_fitted_values_BPRSL_ref
## 1 1         1       1 week0   42    0                     53.19460
## 2 2         1       2 week0   58    0                     43.04516
## 3 3         1       3 week0   54    0                     43.98584
## 4 4         1       4 week0   55    0                     49.13483
## 5 5         1       5 week0   72    0                     58.19506
## 6 6         1       6 week0   48    0                     41.51037
##   bprs_fitted_values_BPRSL_ref1 bprs_fitted_values_BPRSL_ref2
## 1                      49.24017                      49.24017
## 2                      46.97380                      46.97380
## 3                      47.65582                      47.65582
## 4                      49.85313                      49.85313
## 5                      66.39001                      66.39001
## 6                      42.59363                      42.59363
 # draw the plot of BPRSL with the Fitted values of bprs model 1
ggplot(BPRSL, aes(x = week, y = bprs_fitted_values_BPRSL_ref, group = subject, col=treatment)) +
  geom_line(aes(linetype = treatment)) +
  scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
  scale_y_continuous(name = "Fitted BPRS (model 1: rnd intercept)") +
  theme(legend.position = "right") +
  facet_grid(. ~ treatment, labeller=label_both)

# draw the plot of BPRSL with the Fitted values of bprs model 2
ggplot(BPRSL, aes(x = week, y = bprs_fitted_values_BPRSL_ref1, group = subject, col=treatment)) +
  geom_line(aes(linetype = treatment)) +
  scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
  scale_y_continuous(name = "Fitted BPRS (model 2: rnd intercept + slope)") +
  theme(legend.position = "right") +
  facet_grid(. ~ treatment, labeller=label_both)

From the plot, we can see random intercept model differs from random intercept and slope model. Adding a slope intercept didn’t change the outcomes however. We can also see the final plot random intercept and slope with interaction model also different from all three model above. In conclusion we can say that random intercept model doesn’t highlight the individual’s effect on bprs over time and also the outcomes didn’t change with subsequent model.

******************************************************************* END ******************************************************************

DONE AND DUSTED \(\large \surd\) !!!

Well not really !! There is so much more to learn. This course was however a great “push” I have truly enjoyed the exercise sessions, the course material and the review exercises. I have learned so much and I am looking forward to learning more !!

Merry Christmas and a Happy New Year Everyone !!